AITopics | approximate regret

Collaborating Authors

approximate regret

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning in Games: Robustness of Fast Convergence

Dylan J. Foster, zhiyuan li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos

Neural Information Processing SystemsApr-21-2026, 20:10:50 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adversarial Blocking Bandits

Neural Information Processing SystemsDec-24-2025, 02:09:09 GMT

We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e.g., arms) where the supplies replenish and can be reused only after certain time periods. We first show that, in the optimization setting, when the blocking durations and rewards are known in advance, finding an optimal policy (e.g., determining which arm per round) that maximises the cumulative reward is strongly NP-hard, eliminating the possibility of a fully polynomial-time approximation scheme (FPTAS) for the problem unless P = NP. To complement our result, we show that a greedy algorithm that plays the best available arm at each round provides an approximation guarantee that depends on the blocking durations and the path variance of the rewards. In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation.

adversarial blocking bandit, approximate regret, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning

Fellows, Mattie, Wibault, Clarisse, Berdica, Uljad, Forkel, Johannes, Osborne, Michael A., Foerster, Jakob N.

arXiv.org Artificial IntelligenceJun-2-2025

Sample efficiency remains a major obstacle for real world adoption of reinforcement learning (RL): success has been limited to settings where simulators provide access to essentially unlimited environment interactions, which in reality are typically costly or dangerous to obtain. Offline RL in principle offers a solution by exploiting offline data to learn a near-optimal policy before deployment. In practice, however, current offline RL methods rely on extensive online interactions for hyperparameter tuning, and have no reliable bound on their initial online performance. To address these two issues, we introduce two algorithms. Firstly, SOReL: an algorithm for safe offline reinforcement learning. Using only offline data, our Bayesian approach infers a posterior over environment dynamics to obtain a reliable estimate of the online performance via the posterior predictive uncertainty. Crucially, all hyperparameters are also tuned fully offline. Secondly, we introduce TOReL: a tuning for offline reinforcement learning algorithm that extends our information rate based offline hyperparameter tuning methods to general offline RL approaches. Our empirical evaluation confirms SOReL's ability to accurately estimate regret in the Bayesian setting whilst TOReL's offline hyperparameter tuning achieves competitive performance with the best online hyperparameter tuning methods using only offline data. Thus, SOReL and TOReL make a significant step towards safe and reliable offline RL, unlocking the potential for RL in the real world. Our implementations are publicly available: https://github.com/CWibault/sorel\_torel.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2505.22442

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Adversarial Blocking Bandits

Neural Information Processing SystemsOct-10-2024, 07:36:49 GMT

We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e.g., arms) where the supplies replenish and can be reused only after certain time periods. We first show that, in the optimization setting, when the blocking durations and rewards are known in advance, finding an optimal policy (e.g., determining which arm per round) that maximises the cumulative reward is strongly NP-hard, eliminating the possibility of a fully polynomial-time approximation scheme (FPTAS) for the problem unless P NP. To complement our result, we show that a greedy algorithm that plays the best available arm at each round provides an approximation guarantee that depends on the blocking durations and the path variance of the rewards. In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation.

adversarial blocking bandit, approximate regret, dynamic approximate regret, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Learning in Games: Robustness of Fast Convergence Dylan J. Foster ⇤ Zhiyuan Li † Thodoris Lykouris ⇤ Karthik Sridharan

Neural Information Processing SystemsMar-12-2024, 16:30:01 GMT

algorithm, approximate regret property, low approximate regret property, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On preserving non-discrimination when combining expert advice

Blum, Avrim, Gunasekar, Suriya, Lykouris, Thodoris, Srebro, Nathan

arXiv.org Machine LearningMar-29-2019

The emergence of machine learning in the last decade has given rise to an important debate regarding the ethical and societal responsibility of its offspring. Machine learning has provided a universal toolbox enhancing the decision making in many disciplines from advertising and recommender systems to education and criminal justice. Unfortunately, both the data and their processing can be biased against specific population groups (even inadvertently) in every single step of the process [BS16]. This has generated societal and policy interest in understanding the sources of this discrimination and interdisciplinary research has attempted to mitigate its shortcomings. Discrimination is commonly an issue in applications where decisions need to be made sequentially. The most prominent such application is online advertising where platforms need to sequentially select which ad to display in response to particular query searches. This process can introduce discrimination against protected groups in many ways such as filtering particular alternatives [DTD15, APJ16] and reinforcing existing stereotypes through search results [Swe13, KMM15]. Another canonical example of sequential decision making is medical trials where underexploration on female groups often leads to significantly worse treatments for them [LDM16]. Similar issues occur in image classification as stressed by "gender shades" [BG18].

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1810.11829

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Law (0.87)
Health & Medicine (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.79)

Add feedback

Learning in Games: Robustness of Fast Convergence

Foster, Dylan J., Li, Zhiyuan, Lykouris, Thodoris, Sridharan, Karthik, Tardos, Eva

Neural Information Processing SystemsDec-31-2016

We show that learning algorithms satisfying a low approximate regret property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a (1+eps)-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. in a number of ways. We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and show convergence under bandit feedback. Finally, we improve upon the speed of convergence by a factor of n, the number of players. Both the scope of settings and the class of algorithms for which our analysis provides fast convergence are considerably broader than in previous work. Our framework applies to dynamic population games via a low approximate regret property for shifting experts. Here we strengthen the results of Lykouris et al. in two ways: We allow players to select learning algorithms from a larger class, which includes a minor variant of the basic Hedge algorithm, and we increase the maximum churn in players for which approximate optimality is achieved. In the bandit setting we present a new algorithm which provides a "small loss"-type bound with improved dependence on the number of actions in utility settings, and is both simple and efficient. This result may be of independent interest.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback